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ABSTRACT 

This paper presents a real-time additive sound synthesis appli¬ 
cation with individual outputs for each partial and noise component. 
The synthesizer is programmed in C++, relying on the Jack API for 
audio connectivity with an OSC interface for control input. These 
features allow the individual spatialization of the partials and noise, 
referred to as spectro-spatial synthesis, in connection with an OSC 
capable spatial rendering software. Additive synthesis is performed 
in the time domain, using previously extracted partial trajectories 
from instrument recordings. Noise is synthesized using bark band 
energy trajectories. The sinusoidal data set for the synthesis is gen¬ 
erated from a custom violin sample library in advance. Spatialization 
is realized using established rendering software implementations on 
a dedicated server. Pure Data is used for processing control streams 
from an expressive musical interface and distributing it to synthe¬ 
sizer and Tenderer. 

1. INTRODUCTION 
1.1. Sinusoidal Modeling 

Additive synthesis is among the oldest digital sound creation meth¬ 
ods and has been the foundation of early experiments by Max Math¬ 
ews at Bell Labs. It allows the generation of sounds rich in timbre, 
by superimposing single sinusoidal components, referred to as par¬ 
tials, either in the time- or frequency domain. Based on the Fourier 
Principle, any quasi-periodic signal y(t) can be expressed as a sum 
of Np a rt. sinusoids with varying amplitudes a n (t) and frequencies 
i On (t) and an individual phase offset ip n : 

N pa rt 

v(t)= Y o„(f) sin(oj„(t) t + ifin) (1) 

71 = 1 

In harmonic cases, which applies to the majority of musical in¬ 
strument sounds, the partial frequencies can be approximated as in¬ 
teger multiples of /o: 

N part 

y(t)= Y a n .(t) sin(2 tv n f 0 (t) t + <p n ) (2) 

71=1 

Although relative phase fluctuations are important for the per¬ 
ception {Tj, the original phase can be ignored in many cases, which 
is of benefit for manipulations of the modeled sound: 

Npart 

y(t)= Y a n (t) sin(2 n n f 0 (t) t) (3) 

71=1 

Based on this theory, an algorithm for speech synthesis has been 
proposed by McAulay et al. (2j. For musical sound synthesis the 
algorithm has been added a noise component |3], resulting in the 


sinusoids+noise model. The signal is then modeled as the sum of the 
deterministic part Xdet and the stochastic part x st0 ch, also referred 
to as residual: 

X — Xdet “F tCstoch. (4) 

Modeling of residuals can for example be performed by approx¬ 
imating the spectral envelope using linear predictive coding |3] or a 
filter bank based on Bark frequencies |4j. The phase of the stochastic 
signal is random, in theory, and thus needs not be modeled. However, 
residuals usually are not completely random since they still contain 
information from the removed harmonic content. 

In order to fully model the sounds of arbitrary musical instru¬ 
ments, a transient component Xtrans is included |4j in the full signal 
model. This component captures plucking sounds and other percus¬ 
sive elements: 

X — Xdet “F Xstoch “F Xtrans (5) 

Since the work presented in this paper focuses on the violin in 
legato techniques, the transient component can be neglected without 
impairing the perceived quality of a re-synthesis. 

1.2. Spectral Spatialization 

In electronic and electroacoustic music, the term spectral spatializa¬ 
tion refers to the individual treatment of a sound’s frequency compo¬ 
nents for a distribution on sound reproduction systems J5J. Timbral 
sound qualities can thusly be linked to the spatial image of the sound, 
even for pre-existing or fixed sound material. In the case of spectro- 
spatial synthesis, this process is integrated on the synthesis level,for 
example in additive approaches. This is not yet a common feature 
in available synthesizers, but several research projects have been in¬ 
vestigating the possibilities of such approaches with applications in 
musical sound processing, sound design, virtual acoustics and psy¬ 
choacoustics. 

Topper et al. ||6]| apply additive synthesis of basic waveforms 
(square wave, sawtooth), physical modeling and sub-band decompo¬ 
sition in a multichannel panning system with real time, prerecorded 
and graphic control. Their system is implemented in MAX/MSP and 
RTcmix, running on both Mac and PC/Linux hardware with a total 
of 8 audio channels. 

Verron et al. |[7j use the sinusoids + noise model for spectral 
spatialization of environmental sounds. Each component can be syn¬ 
thesized with individual position in space on Ambisonics and Binau¬ 
ral systems. Deterministic and stochastic components are composed 
and added together in the frequency domain and subsequently spa¬ 
tially encoded with a filterbank. Control over the synthesis process 
is depending on the nature of the environmental sounds j8j. 

In the context of electroacoustic music, James (9| expands Den¬ 
nis Smalley’s concept of spectromorphology to the idea of spatiomor- 
phology. Timbre Spatialization is achieved using terrain surfaces 
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Figure 1: Partial amplitude trajectories of a violin sound 
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Figure 3: Unwrapped partial phases of a violin sound 
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Figure 2: Partial frequency trajectories of a violin sound 


Figure 4: Bark band energy trajectories of a violin sound 


and by mapping these to spacio-spectral distributions. Max-MSP 
is used for computing the contribution of spectral content to individ¬ 
ual speakers with Distance-based amplitude panning (DBAP) and 
Ambisonic Equivalent panning (AFP) methods. 

Spectral spatialization can also be used to synthesize dynamic 
directivity patterns of musical instruments in virtual acoustic envi¬ 
ronments. Since the directivity in combination with movement has 
a significant influence on an instrument’s sound, this can increase 
the plausibility. Warusfel et al. eg use a tower with three cubes, 
each containing multiple speakers, to spatialize frequency bands of 
an input signal for the simulation of radiation patterns. 

1.3. The Presented Application 

The presented application incorporates different synthesis modes, of 
which only the so called deterministic mode will be subject of this 
paper. In this basic mode, precalculated parameter trajectories, as 
presented in Sec. [2] are used for a manipulable resynthesis of the 
original instrument sounds. 

The software architecture is designed to allow the use of addi¬ 
tive synthesis, respectively of sinusoidal modeling, on sound field 
synthesis systems or other reproduction setups. This is achieved by 
providing individual outputs for all partials and noise bands in an 
application implemented as a JACK client, described in Sec. [3] Us¬ 
ing JACK allows the connection of all individual synthesizer output 


channels to a JACK-capable tenderer, such as the SoundScape Ren- 
derer (SSR) |ll| , Panoramix (12) or the HOA- Library ED- Making 
each partial a single virtual sound source in combination with these 
rendering softwares, the spatial distribution of the synthesis can be 
modulated in real-time. Pure Data G3 is used to receive control 
data from gestural interfaces or to play back predefined trajectories 
for generating control streams for both the synthesizer and the spa¬ 
tialization Tenderer. A direct linkage between timbre and spatializa¬ 
tion is thus created, which is considered essential for a meaningful 
spectro-spatial synthesis. 

2 . ANALYSIS 

The TU-Note Violin Sample Library ED- 116| . is used as audio con¬ 
tent for generating the sinusoidal model. Designed in the style of 
classic sample libraries, this data set contains single sounds of a vio¬ 
lin in different pitches and intensities, recorded at an audio sampling 
rate of 96 kHz with 24 Bit resolution. 

Analysis and modeling is performed beforehand in Matlab, us¬ 
ing monophonic pitch tracking and subsequent extraction of the par¬ 
tial trajectories by peak picking in the spectrogram. YIN ]J7| and 
SWIPE 1 1 8) are used as monophonic pitch tracking algorithms. Based 
on the fO-trajectories, partial tracking is performed with STFT, ap¬ 
plying a hop-size of 256 samples (2.7ms) and a window size of 
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Figure 5: Sequence diagram for the jack callback function 


4096 samples, zero-padded to 8192 samples. Quadratic interpola¬ 
tion (QIFFT), as presented by Smith et al. is applied for peak 
parameter estimation of up to 80 partials. Due to the sampling fre¬ 
quency, the full number of partials is only analyzable up to the note 
D 5 (576.65 Hz) 

By subtracting the deterministic part from the complete sound in 
the time domain, the residual signal is obtained. The residual is then 
filtered using a Bark scale filterbank with second order Chebyshev 
bandpasses and the temporal energy trajectories are calculated for 
the resulting 24 band-limited signals. At this point, a large amount 
of information is removed from the residual signal. Due to the short¬ 
comings of the time domain subtraction method, the residual still 
contains information from the deterministic component. By averag¬ 
ing the energy over the Bark bands, this relation is eliminated. 

Results of the synthesis stage are trajectories of the partial am¬ 
plitudes, as shown in Figure[I] the trajectories of partial frequencies 
and phases, as shown in Figurc[2] respectively Figure[3]as well as the 
trajectories of the Bark-band energies, illustrated in Figure [4] The 
resulting data is exported to individual YAML Hies for each sound, 
which can be read by the synthesis system. 

3. SYNTHESIS SYSTEM 

3.1. Libraries 

The synthesis application is designed as a standalone Linux com¬ 
mand line software. The main functionality of the synthesis system 
relies on the TACaQaPI for audio connectivity and the liblc 0 respec- 

*http://jackaudio.orq/ 

2 https://github.com/radarsat1/liblo 


tively the liblo C++ wrapper for receiving control signals. libyaml- 
cp/rlis used for reading the data of the modeled sounds and the rel¬ 
evant configuration files, libsndfil^ for reading the original sound 
files, as well as the are included but not relevant for the as¬ 

pects presented in this paper. Frequency domain synthesis and sam¬ 
ple playback are partially implemented but not used at this point. 

3.2. Algorithm 

Both the sinusoidal and the noise component are synthesized in the 
time domain, using a non-overlapping method. For the sinusoidal 
component, the builtin sin () function of the cmath library and a 
custom lookup table can be selected. The choice does not affect the 
overall performance, significantly. The filter bank for the noise syn¬ 
thesis consists of 24 second order Chebyshev bandpass filters with 
fixed coefficients, calculated before runtime. The amplitude of each 
frequency band is driven by the previously analyzed energy trajecto¬ 
ries. 

During synthesis, the algorithm reads a new set of support points 
from the model data for each audio buffer and increments the posi¬ 
tion within the played note. Figure [5] shows a sequence diagram 
for the deterministic synthesis algorithm, starting at the JACK call¬ 
back function, which is executed for each buffer of the JACK audio 
server. Since the synth is designed to enable polyphonic play, the 
voice manager object handles incoming OSC messages in the func¬ 
tion update_voices () to activate or deactivate single voices. 

3 https://github.com/jbeder/yaml-cpp/ 

‘'http: //www .mega-nerd. com/libsndfile/ 

^ http://www.fftw.org/ 


































































Proceedings of the 17 ,h Linux Audio Conference (LAC-19), CCRMA, Stanford University, USA, March 23-26, 2019 



Audio Out 


Figure 6: Combination of synthesizer and Tenderer on separate ma¬ 
chines using Pure Data for synth configuration and parameter parsing 


For the synthesis of mostly monophonic, excitation continuous in¬ 
struments like the violin, the polyphony merely handles the overlap¬ 
ping of released notes. Subsequently, the voice manager loops over 
all active voices in the function getNextFrame_TD (), first set¬ 
ting the new control parameters for each voice. 

In cycle_start_deterministic (), support points for 
all partial’s parameters are picked at the relevant voice’s playback 
position. These support points are then linearly interpolated over the 
buffer length in set_interpolator (). 

Finally, in getNextBlock_TD (), each single voice gener¬ 
ates the output for all sinusoids and all noise bands in two separate 
vectorizable loops, adding both to the output buffer. 

3.3. Runtime Environment and Periphery 

The runtime system for the synthesis is starting a JACK server with 
48 kHz sampling rate, a buffer size of 128 samples and 2 periods 
per buffer. This results in 5.3 ms latency for the audio playback, 
which is within the limits for this synthesis approach. On an Intel(R) 
Core(TM) i7-5500U CPU @ 2.40GHz with disabled speed-stepping 
and a Fireface UFX. the JACK server is showing an average load of 
approximately 20 %. 

The interaction of the involved software components is visual¬ 
ized in Figure[6] For reasons of performance and increased flexibility 
in the studio, two separate machines are used for synthesis and spa- 
tialization. Connectivity between the systems is realized with MADI 
or DANTE, using individual channels for the 80 partials and 24 noise 
bands. 


3.4. Control 



Figure 7: Spatialization scene in a 2D setup with 30 partials and their 
positions 

The control data for the partial positions in the rendering soft¬ 
ware is not generated in the synthesis system at this point and is 
managed, externally. This offers more flexibility for testing different 
mappings at this stage of development. A Pure Data patch is used to 
receive incoming control messages, either from OSC or MIDI, and 
distribute them to the synthesizer and the spatialization software. For 
live performance, the patch receives continuous control streams for 
pitch and intensity from an improved version of the interface pre¬ 
sented by von Coler et al. |20| and visualizes the sensor data. Pitch 
and intensity are forwarded to the synth, directly. Additionally, data 
from several Force Sensitive Resistors (FSR) and a 9 degrees of free¬ 
dom IMU, which can be used for controlling the spatialization, is 
sent to the patch. 

Figure [ 7 ] shows an example for a simple spatialization mapping 
on a 2D system. The absolute orientation of the IMU is used to con¬ 
trol the general direction of the partial flock. A second parameter 
S , derived from the intensity and additional sensor data, controls the 
spread of the partials around this angle, depending on the partial in¬ 
dex. 

4. CONCLUSION 

After significantly improving the performance of the synthesis sys¬ 
tem, the application can now be used with the full 80 partials and 
24 Bark bands as individual outputs. Recent tests in combination 
with different spatial rendering softwares and different loudspeaker 
setups show promising results. However, the dynamic spatialization 
of such number of virtual sound sources and the resulting traffic of 
OSC messages is demanding for the runtime system. Using separate 
machines for synthesis and rendering reduces the individual load. 
The number of rendering inputs can also be reduced without limit¬ 
ing the perceived quality of the spatialization. Multiple partials may 
share one virtual sound source. 
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Next steps are now possible, which include the empirical inves- [12] 
tigation of mappings from controller sensors to both the spectral and 
spatial sound properties. This includes user experiments to evalu¬ 
ate different mapping and control paradigms, as well as perceptual 
measurements of the synthesis results. 
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